3 research outputs found

    Beyond Reward: Offline Preference-guided Policy Optimization

    Full text link
    This study focuses on the topic of offline preference-based reinforcement learning (PbRL), a variant of conventional reinforcement learning that dispenses with the need for online interaction or specification of reward functions. Instead, the agent is provided with fixed offline trajectories and human preferences between pairs of trajectories to extract the dynamics and task information, respectively. Since the dynamics and task information are orthogonal, a naive approach would involve using preference-based reward learning followed by an off-the-shelf offline RL algorithm. However, this requires the separate learning of a scalar reward function, which is assumed to be an information bottleneck of the learning process. To address this issue, we propose the offline preference-guided policy optimization (OPPO) paradigm, which models offline trajectories and preferences in a one-step process, eliminating the need for separately learning a reward function. OPPO achieves this by introducing an offline hindsight information matching objective for optimizing a contextual policy and a preference modeling objective for finding the optimal context. OPPO further integrates a well-performing decision policy by optimizing the two objectives iteratively. Our empirical results demonstrate that OPPO effectively models offline preferences and outperforms prior competing baselines, including offline RL algorithms performed over either true or pseudo reward function specifications. Our code is available on the project website: https://sites.google.com/view/oppo-icml-2023

    RSG: Fast Learning Adaptive Skills for Quadruped Robots by Skill Graph

    Full text link
    Developing robotic intelligent systems that can adapt quickly to unseen wild situations is one of the critical challenges in pursuing autonomous robotics. Although some impressive progress has been made in walking stability and skill learning in the field of legged robots, their ability to fast adaptation is still inferior to that of animals in nature. Animals are born with massive skills needed to survive, and can quickly acquire new ones, by composing fundamental skills with limited experience. Inspired by this, we propose a novel framework, named Robot Skill Graph (RSG) for organizing massive fundamental skills of robots and dexterously reusing them for fast adaptation. Bearing a structure similar to the Knowledge Graph (KG), RSG is composed of massive dynamic behavioral skills instead of static knowledge in KG and enables discovering implicit relations that exist in be-tween of learning context and acquired skills of robots, serving as a starting point for understanding subtle patterns existing in robots' skill learning. Extensive experimental results demonstrate that RSG can provide rational skill inference upon new tasks and environments and enable quadruped robots to adapt to new scenarios and learn new skills rapidly

    Impact of extracorporeal membrane oxygenation in immunocompetent children with severe adenovirus pneumonia

    No full text
    Abstract Background Severe adenovirus (Adv.) pneumonia can cause significant mortality in young children. There has been no worldwide consensus on the impact of extracorporeal membrane oxygenation (ECMO) in immunocompetent children with severe Adv. pneumonia. This study aimed to assess the impact of ECMO in immunocompetent children with severe Adv. pneumonia. Methods This study evaluated the medical records of 168 hospitalized children with severe Adv. pneumonia at the Guangzhou Women and Children’s Medical Center between 2019 and 2020.Nineteen patients in the ECMO group and 149 patients in the non-ECMO group were enrolled. Results Between these two groups, there were no differences in host factors such as sex, age (all P > 0.05). Significant differences were observed in shortness of breath/increased work of breathing; cyanosis; seizures; tachycardia; the partial pressure of oxygen in arterial blood (PO2); the ratio of PaO2 to the fraction concentration of oxygen in inspired air (FiO2; P/F); white blood cell, lymphocyte, monocytes, lactate dehydrogenase (LDH), serum albumin, and procalcitonin levels; and, pulmonary consolidation (all P < 0.05). There were significant differences in the parameters of mechanical ventilation (MV) therapy and complications such as respiratory failure, acute respiratory distress syndrome, septic shock, length of hospitalization, and death (all P < 0.05). The maximum axillary temperatures, respiratory rates, heart rates and LDH levels after receiving ECMO were significantly lower than those before ECMO (all P < 0.05). Additionally, SPO2, PO2, and P/F were significantly higher than those before ECMO (all P < 0.05). In MV therapy, FiO2, PIP, and PEEP were significantly lower than those before ECMO (all P < 0.05). Conclusions In our study, the clinical conditions of the patients in the ECMO group were much more severe than those in the non-ECMO group. Our study showed that ECMO might be beneficial for the patients with severe Adv. pneumonia
    corecore